MPI-2 One-Sided Usage and Implementation for Read Modify Write Operations: A Case Study with HPCC
نویسندگان
چکیده
MPI-2’s one-sided communication interface has become prevalent in scientific applications. One of the important operations in a one sided model is read-modifywrite. MPI-2 semantics provide MPI Put, MPI Get and MPI Accumulate operations which can be used to implement the read-modify-write functionality. The different strategies yield varying performance benefits depending on the underlying one-sided implementation. We use HPCC Random Access benchmark which primarily uses read-modify-write operations as a case study for evaluating the different implementation strategies in this paper. Currently this benchmark is implemented based on MPI two-sided semantics. In this work we design and evaluate MPI-2 versions of the HPCC Random Access benchmark using one-sided operations. To improve the performance, we explore two different optimizations: (i) software based aggregation and (ii) hardware-based atomic operations. We implement aggregation techniques using MPI Accumulate with datatypes to improve the performance of one sided implementation. In order to study the impact of hardware capabilities provided by modern interconnects, we implement a prototype of Accumulate for MPI Sum (Direct Accumulate) using InfiniBand’s atomic fetch and add operation. We evaluate our different approaches on an InfiniBand cluster. We analyze the benefits of software aggregation using datatypes with one-sided operation as well as the hardware based Direct Accumulate. The software based aggregation outperforms the basic one sided scheme without aggregation by a factor of 4.38. The hardware based scheme shows an improvement by a factor of 2.62 as compared to the basic one sided scheme. Our study shows that the software based aggregation performs the best. We also demonstrates the potential and scalability of the hardware based approach. keywords: MPI-2, one-sided, HPCC , Accumulate, InfiniBand
منابع مشابه
Supporting MPI-2 One Sided Communication on Multi-rail InfiniBand Clusters: Design Challenges and Performance Benefits
In cluster computing, InfiniBand has emerged as a popular high performance interconnect with MPI as the de facto programming model. However, even with InfiniBand, bandwidth can become a bottleneck for clusters executing communication intensive applications. Multi-rail cluster configurations with MPI-1 are being proposed to alleviate this problem. Recently, MPI-2 with support for one-sided commu...
متن کاملImplementing Byte-Range Locks Using MPI One-Sided Communication
We present an algorithm for implementing byte-range locks using MPI passive-target one-sided communication. This algorithm is useful in any scenario in which multiple processes of a parallel program need to acquire exclusive access to a range of bytes. One application of this algorithm is for implementing MPI-IO’s atomic-access mode in the absence of atomicity guarantees from the underlying fil...
متن کاملSFIO, Système de fichiers distribués pour MPI-I/O
This paper presents the design and evaluation of a Striped File I/O (SFIO) library for parallel I/O in an MPI environment. We present techniques for optimizing communications and disk accesses for small striping factors. Using MPI derived datatype capabilities, we transmit fragmented data over the network by single MPI transfers. We present first results regarding the I/O performance of the SFI...
متن کاملLocal Read-Write Operations in Sensor Networks
Designing protocols and formulating convenient programming units of abstraction for sensor networks is challenging due to communication errors and platform constraints. This paper investigates properties and implementation reliability for a local read-write abstraction. Local read-write is inspired by the class of read-modify-write operations defined for shared-memory multiprocessor architectur...
متن کاملPGAS Models using an MPI Runtime: Design Alternatives and Performance Evaluation
Programming models play a critical role in designing scalable applications. In the past few decades, MPI [3] has become the de facto programming model for writing parallel applications. At the same time, alternative programming models such as Partitioned Global Address Space (PGAS) programming models are gaining traction due to the asynchrony, ability to read/write distributed data structures a...
متن کامل